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Abstract 

Suppose you and I are both editing a document. You make and change and I make a 
change, concurrently. Now if we want to still be seeing the same document then I need 
£C) to apply your change after mine and you mine after yours. But we can't just apply them 

t-H willy-nilly. I must amend yours somehow and you mine. If my change is written A, 

yours 8, my amended change S.A and yours A. 5, we get *A * A. 5 = *S * 6. A as long as 
application is written * and we don't care about what we're applying the changes to. 
We start by proving this identity for single changes and finish by proving it for many. 

As I was eating, I saw Salvatore in one corner, obviously having made his peace with the cook, 
for he was merrily devouring a mutton pie. He ate as if he had never eaten before in his life, not 
letting even a crumb fall... He winked at me and said, in that bizarre language of his, that he was 
eating for all the years when he had fasted. - Umberto Eco, The Name of the Rose 

1 Introduction 

Q 

Consider two users making changes via their local clients to a document held on a remote server. 
It is the job of the software running on both the clients and the server to merge these changes 
somehow. The diff3 algorithm, recently formalised [2], will attempt to merge these changes by 
comparing them with the original document. It is designed to ensure that any changes that are 
merged without further recourse to the user do not conflict. If conflicts do occur, these are flagged 
(Nl and reported to both users, one of whom must make the necessary efforts to resolve them. 

What if such an opportunity for user intervention is not possible or appropriate, however? It 
is possible to merge all changes without conflict under the most general conditions? We show that 
the answer is a qualified yes, with two provisos. Firstly, the changes need to be handled in order. 
Secondly, it must be accepted that the result of merging the changes may not make immediate 



sense. This is the inevitable consequence of avoiding all conflicts, but there are situations when this 
__ is acceptable. Google Docs [T], for example, provides a collaborative tool for editing documents 

in real time. A document's integrity, enshrined by the dif f 3 algorithm, is passed over in favour 
. rH of a more direct approach. Here we formalise a similar approach and prove that, given the above 

provisos, any number of concurrent changes to a document by any number of users can be merged 

without conflicts and at a relatively low computational cost. 

2 Background 

Suppose users 1 and 2 begin with the same instance s of a document, and suppose both insert 
the character 'a' at differing positions, user 1 at position 12 and user 2 at position 27. When 
each character is inserted, the client generates a "diff" representing the insert, namely i(12, a) and 
i(27, a) for users 1 and 2, respectively. Each user now sees a different document, which we represent 
by way of the original instance of the document with the relevant diff applied to it, so s * i(12, a) 
and s * i(27, a) for users 1 and 2, respectively. 

It is clear that each client cannot necessarily apply the other user's diff without first amending 
it. The i(27, a) diff cuts the original document s at position 27, for example, and if it is to affect 
the document s * i(12,a) in the same way it must cut it at position 28. We say that the i(12,a) 
diff "lifts" the i(27,a) diff and write i(28,a) = i(27,a) t i(12,o). On the other hand, the i(12,o) 
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diff cuts the original document s at position 12 and if it is to affect the document s * i(27,a) in 
the same way it must still cut it at position 12. There is no need for it to be lifted, therefore. 
These amended diffs, one lifted, the other left unchanged, can then be applied to the respective 
documents with the end result being the same: 

s * i(12, a) * i(28, a) = s * i(27, a) * i(12, a) (2.1) 

We can extend the concept of lifting to deletes. Suppose that user 1 deletes the 5th character from 
the original document s and user 2 deletes the 16th character. The clients again generate diffs, 
this time d(5, 1) and d(16, 1) for users 1 and 2, respectively. If we think along the same lines as the 
previous argument we see that the d(l6, 1) diff will leave the d(5, 1) diff unchanged whereas the 
d(5, 1) diff will lift the c?(16, 1) diff. This diff cuts the original document s at position 16 but must 
cut the document s * g?(5, 1) at position 15 if it is to affect it in the same way. This time we write 
d(15, 1) = <i(16, 1) t d(5, 1). As in the previous argument these two amended diffs, one lifted, the 
other left unchanged, can then be applied to the respective documents with the end result being 
the same: 

s * d(5, 1) * d(15, 1) = s * <f(16, 1) * d(5, 1) (2.2) 



Identites |2 . 1 1 and |2 ,2| are clearly instances of a more general rule and in what follows we show that 
this rule holds under for the moment what are quite restrictive conditions. We begin by assuming 
that the two diffs in question do not clash. In the case of deletes it is clear what this means. If 
the parts of the document that are deleted overlap, the deletes are said to clash. In the case of 
inserts, since the content of the document remains intact but is merely shifted, it could be argued 
the clashes never occur. For the moment we adopt the same rules as that for deletes, however. If 
two inserts overlap, they clash. Combinations of inserts and deletes arc treated similarly. We can 
now define lifting for diffs that do not clash. 

Definition 2.1. 

i{n 2l s 2 ) t «(ni,si) = i(n 2 + |si|,s 2 ) rii + |si| < n 2 

i{n 2l s 2 ) f d(mji) = i(n 2 - h,s 2 ) Hi + 1% < n 2 

d(n 2 ,l 2 ) t i(ni,si) = d(n 2 + |si|,Z 2 ) «i + |si| < n 2 

d(n 2 ,l 2 ) f d(ni, h) = d(n 2 - h,l 2 ) n x + h < n 2 

Note that lifting is only defined when the diff to be lifted is strictly to the right of the diff that 
does the lifting. To continue, in order to formulate a general rule, we must define the concept of 
one diff affecting the other, even if the effect is to leave the other unchanged. 

Definition 2.2. 

i(n 1 ,s 1 )(i(n 2 ,s 2 )) 
d(ni,h)(i(n 2 ,s 2 )) 
i(ni,si)(d(n 2 ,/ 2 )) 
d(n 1 ,l 1 )(d(n 2 ,l 2 )) 

We define the substring s[n...m] to be the string formed by taking the n'th to the m'th characters of 
the string s inclusive. We also make use of the abbreviations s[...m] = s[0...m] and s[n...] = s[n...|s|] 
where \s\ is the length of the string. We can now formulate a general rule. 

Lemma 2.1. Assuming (m ^ n 2 + \s 2 \) A [n\ + \s\\ ^ n 2 ), (ni ^ n 2 + \s 2 \) A (?ii + l\ ^ n 2 ), etc, 
we have, respectively: 
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s*i(m,Si) *i{n ll si)(i{n 2 ,s 2 )) = s*i(n 2 ,s 2 ) * i(n 2 , s 2 ){i(n u si)) 
s*d(ni,li) *rf(ni,Zi)(i(n 2 ,s 2 )) = s*i(n 2 ,s 2 ) * «(n 2 , s 2 ) (d(n.i, h)) 
s * i{ni,si) * i(nx, s 1 )(d(n 2 ,l 2 )) = s * d(n 2 ,l 2 ) * d(n 2 , l 2 ) (i(n 1} Si)) 
s * d(rii,li) * d(ni,li)(d(n 2 ,l 2 )) = s * d{n 2 ,l 2 ) * d(n 2 , Z 2 ) (d(ni, Zi)) 

Proof. We prove the first identity when ni ^ n 2 + |s 2 |, which gives Si) (i(n 2 , s 2 )j = i(n 2 ,s 2 ) 
and i(n 2 , s 2 )(i(ni,si)) = + \s 2 \,s 2 ). Then: 

s * i(ni,si) * si)(i(n 2 , s 2 )) = s* i(m,si) * i(n 2 , s 2 ) 

= (s[...m - 1] + si + s[ni...]) * i(n 2 , s 2 ) 
= s[...n 2 = 1] + s 2 + s[n 2 ... rii - 1] + si + s[m-..] 
= (s[...n 2 - 1] + s 2 + s[n 2 ...]) * i(rz.i + |s 2 |,si) 
= s * i(n 2 , s 2 ) * i(n 2 ,s 2 )(i(rii,sx)) 

The other seven cases are entirely similar. □ 

Note that this rule holds only when the two diffs are entirely separate, with one diff's affect on the 
other not being defined when n\ < n 2 + |s 2 | and n 2 < m + \ si\. We address this deficiency in the 
next section. 

We now consider the two scenarios resulting from the differing orders in which the diffs are put 
on the server, shown in figures [T] and [2] with the outer strands representing the client states and 
the two inner strands representing the server state. 
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Figure 1: A put results in the put insert being amended 
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Figure 2: A put results in an existing insert being amended 
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The first scenario is shown in figure [T] Here the i(12,a) diff is put on the server first. When the 
i(27,a) diff is put on the server, it must be lifted. The second scenario is shown in figure [2] Here 
the i(27, a) diff is put on the server first. When the i(12, a) diff is put on the server, it is still the 
i(27, a) diff that must be lifted. To summarise, it is the i(27,a) diff that is lifted regardless of the 
order in which the diffs are put on the server. Assuming that the diffs do not clash, it is clear that 
when a diff is put on the server it must be compared to the existing diff on the other stack, if there 
is one, and that one or other must be lifted as a result. 

We have defined the effect of one diff on another as being either lifting or leaving it unchanged 
and we therefore conclude that when a diff is put on the server and a diff already exists there, 
both are affected, each by the other. Although this conclusion seems somewhat strained because 
at this point we have only considered the most simple cases, it turns out to be fundamental. We 
formalise it in a more abstract way before moving on. 
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Figure 3: An abstract representation of diffs being put on the server 



Let A and 5 be arbitrary diffs. When we make the distinction at all, we say that A is put on the 
server first. As a result of 5 being put on the server we have concluded that both A and 5 must 
be amended, with A becoming 5(A) and 5 becoming A(5). Figure [3] shows the construction. We 
are not particularly interested in the document that these diffs affect and so we omit this. As a 
result of the amendments to the diffs we expect the following identity to hold: 

* A * A(5) = *5 * 5(A) (2.3) 

We have proved that this identiy holds in the most simple cases. In the next section we prove it 
formally for all cases and in the sections that follow, extend it to many diffs and many users. 



3 A formal treatment for single diffs 

Let E be a non-empty, finite set of characters from some alphabet, ranged over by a . A string is 
any finite or countable sequence of characters from S, ranged over by s, s' and so on. The length 
of a string s, written |s|, is the length of this sequence. The set of these strings is written S* and 
the set of non-empty strings E + . Substrings are as previously defined. The last character of a 
string s is written s[— 1] if the string is not of zero length. We write a n for a string of identical 
characters and cr w for such a string of infinite length. We write s' + s" for the concatenation of 
strings s' and s". We usually omit the + sign for strings of the form a n . 

Definition 3.1. The diffs A, 5 and so on range over the following set: 

{i(n, s)\n e N, s e £+} U {d(n, I) \n e N ,1 e N+} 

Intuitively i(n, s) is an insert, d(n, I) a delete. We say that the insert i(n, s) employs the string s. 

Definition 3.2. The diff e ranges over the following set: 

{eQ} 

Intuitively e() is the empty diff. 
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Definition 3.3. 



Definition 3.4. 



L z(?i, s) = n i(n, = n + \s\ — 1 
L d(n, I) = n d(n, I) = n + I — 1 



A ~ <5 i# L A = j5 A ! (5 zff ( L A < 5J A (A, ^ L <5) 
A « <5 iffA J < Ji A < 5 iff ( L A < L <5) A (A J ^ L <5) 
A » S iff^A >S J A>8 iff^A < Sj) A (Aj > SJ 

Lemma 3.1. lifter A « 8, A < 5, A ~ <5,A > (5 or A » 5. □ 

Lemma 3.2. A ! S if and only if A < 5, A ~ S or A > S . □ 

We now extend lifting to include cases in which the diffs overlap. 

Definition 3.5. For rii n<x- 

i(ri2, s 2 ) t i(ni,si) = i(n 2 + \si\,s 2 ) d(n 2 ,h) t = d(n 2 + \si\,l 2 ) 

i(ri2, s 2 ) t d(ni,h) = i(n 2 - h, s 2 ) d{n 2l l 2 ) f d(ni,h) = d(n 2 - h,l 2 ) 



Recall that we want to assert identity |2.3| in all cases. To begin with we treat the cases in which 
A and S are both inserts. 

Definition 3.6. When A and S are both inserts with A = 5, A(S) = S and (5(A) = A. 

Lemma 3.3. When A and S are both inserts with A = 6, *A * A(S) = *6 * 8(A). □ 

Suppose that A and S employ the strings s\ and s 2 , respectively, and that these strings are of 
equal length but not identical. We assume that there is some lexicographical ordering on S* so 
that we can say either s% < s 2 or si > s 2 . 

Definition 3.7. When A and S are both inserts with A ~ S, A_, = 5j but A ^ 5: 

m= UtAs 2 < Sl m= (AtSs 1<S2 

I 5 otherwise I A otherwise 

Intuitively we lift the lesser of the two diffs, lexicographically speaking, regardless of whether it is 
A or S. 

Lemma 3.4. When A and 6 are both inserts with A ~ 5, A_, = 5j but A / i5, *A * A(5) = 
*5*6(A). □ 

In what follows, rather than consider the action of diffs on strings or resorting to the plethora of 
symbols just defined, we use visual proofs. We consider the action of diffs on a string of eight 
characters only. As an example we consider the case of two separate inserts, a case already covered 
in lemma |2.1| The figure below shows the construction, with both inserts lined up behind the 
string, ready to be applied to it. On the left, as A is applied first and since S is strictly to the 
right of A, it must be lifted. In the final step, the lifted diff is then applied. On the right the diffs 
are reversed, with the end result being the same. Note that this time A is not lifted, since it is 
strictly to the left of 5. 

1+ +1 5 1+ + +1 A 

+ + +1 A 1+ +1 5 

abcdefgh abcdefgh 

1+ +1 A 1+ + +1 A 
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■ ■ ■ la b c dl ■ ■ le f g h I ■ ■ ■ la b c dl ■ ■ le f g h 
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. *A * 5 = 
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Figure 4: Two inserts 

Employing specific strings does not clarify the proofs and instead the diffs are distinguished by 
their lengths. The fact that the end result is the same both left and right constitutes the proof. 
Although informal, these visual proofs capture the essence of each argument and as the cases 
become more subtle they become indispensable. We begin by proving the remaining cases for two 
inserts in this manner. 

Definition 3.8. When A and 8 are both inserts with A <<; S or A < 8, A(8) — (8 f A) and 
(5(A) = A. 

Definition 3.9. When A and S are both inserts with A > S or A » S, A(<5) = S and 5(A) = 
(At S). 

Lemma 3.5. When A and S are both inserts with A « S, A < S, A > S or A y> S, *A * A(<5) = 
*5 * (5(A). 



Proof. See figure [4] 

We now treat some of the cases when one diff is an insert and the other a delete. 



□ 



Definition 3.10. When A is an insert and 8 a delete with A « 5, A ~ <5 or A < S, A(8) = (8 f A) 
and 8(A) = A. 

Definition 3.11. When A is an insert and 8 a delete with A » J, A(<5) = 8 and 8(A) — (A f <5). 

Lemma 3.6. When A is an insert and 8 a delete with A <<C 8, A < 8, A ~ 8 or A >>> 8, 
*A * A(8) — *8 * 8(A). 

Proof. See figure [5] □ 

Definition 3.12. When A is a delete and 8 an insert with A « <5, A(8) = (8 f A) and 8(A) = A. 

Definition 3.13. When A is a delete and 8 an insert with A «: <5, A ~ 8 or A > 8, A(S) = 8 
and 8(A) = (A f 8). 

Lemma 3.7. When A is a delete and 8 an insert with A « <5, A ~ 8, A > 8 or A » 8, 
*A * A(8) = *8 * 8(A). 

Proof. See figure [5] □ 
We now treat the two easiest cases when A and 8 are both deletes. 
Definition 3.14. When A and 8 are both deletes with A = 8, A(8) = e and 8(A) = A. 
Definition 3.15. When A and 8 are both deletes with A « 8, A(8) — (8 f A) and 8(A) = A. 
Definition 3.16. When A and 8 are both deletes with A » 8, A(8) = 8 and 8(A) = (A t <5)- 
Lemma 3.8. When A and 8 are both deletes with A <<C <5 or A » 8, *A * A(S) = *8 * 8(A). 
Proof. See figure [6] □ 
An additional definition is needed in order to treat the majority of the remaining cases. 
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Figure 5: An insert and a delete in either order 

Definition 3.17. When A and S are both deletes with A < 8, (5 — A) is such that L (5 — A) = 
Aj + 1,(<$ - A) = 5j and when A > S, (A - 8) is such that (A -8)=S_ 1 + 1,(A - 8) = A_,. 

Intuitively, the portion of 8 overlapping A is discarded in order to form (8 — A) and vice versa. 

Definition 3.18. When A and 8 are both deletes with A < 8 and A^ < <5_,, A(ti) = ((8 - A) f A) 
and 8(A) = (A -8). 

Definition 3.19. When A and 8 are both deletes with A ~ 8 and A d < <5 J; A(8) = ((8 - A) f A) 
and 8(A) = e. 

Definition 3.20. When A and 8 are both deletes with A ~ 8 and A d > <5 J; A(S) — e and 
8(A) = ((A- 8) t 5). 

Definition 3.21. When A and 8 are both deletes with A > 8 and A^ > <5 J; A(S) = (8 — A) and 
8(A) = ((A- 8) t 8). 

Lemma 3.9. When A and 8 are both deletes with A < 8 and A d < 8 J ; A ~ 8 and A_, < 8 A ; A ~ 8 
and A_, > 8j or A > 8 and A_, > 8 J7 *A * A(S) = *8 * 8(A). 

Proof. See figure [7] □ 
In order to treat all of the remaining cases, one further definition is needed. 

Definition 3.22. For diffs d(ni,li) ori(n\,Sx) together with d(n2,h), if "2 < n\ <«2 + / + 2 — 1 

we define: 

d(n 2 , h)~ = d(n 2 ,ni — n 2 )d(n 2 , h) + — d(n 2 ,l 2 — ni + n 2 ) 
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Figure 6: 


Two easy cases for two deletes 







Intuitively, a delete is split by another delete or insert. In fact if the diff S is split by A then 
5~ is no more than (8 — A) with the S + part being the part of S that is discarded. Note that 
when defining the "double diff" (S~; 5 + ), the diff that splits 8 is left implicit, but it is always clear 
from the context. Once formed, the single diffs S + and <5~ that comprise the double diff are often 
amended and therefore it is useful to consider the action of a double diff (6', 6") for more or less 
arbitrary diffs 5' and 8" . 

Definition 3.23. For diffs 8' and 8" , as long as (6" f 6') is defined, we define: 

*(8';8") = *8'*(8"^5') 

Lemma 3.10. If(8~;8 + ) is formed from 8 then *(8~;5 + ) = *5. □ 
We are now in a position to prove the remaining cases. 

Definition 3.24. When A is an insert and 8 a delete with A > 8, A(S) = (5~;8 + t A) and 
6(A) = (At 5-). 

Definition 3.25. When A is a delete and S an insert with A < 5, A(<5) = (5 f A~) and 
<S(A) = (A-;A+tA). 

Definition 3.26. When A and S are both deletes with A > S and < 8 J} A{5) = (5~; (8+ - A) f 
A) and (5(A) = e. 

Definition 3.27. When A and 8 are both deletes with A < 8 and A A > 8 A , A(8) = e and 
<5(A) = (A-;(A+-S)t£). 

Lemma 3.11. If A is an insert and 8 a delete with A > 8; A is a delete and 8 a insert with 
A < 8; A and 8 are both deletes with A < 8 and Aj > or A and 8 are both deletes with A > 8 
and A_j < 8 Jr then *A * A(S) = *S * 8(A). 



Proof. See figure [8] 

All the cases have now been covered. 

Theorem 3.1. If A and 8 are any two diffs, then *A * A(<5) = *8 * 8(A). 



□ 



□ 



4 A attempted treatment for many diffs 

To begin with we redefine A and 8 to be arrays of diffs, setting A — [A , A^..] and 8 — [8 , Si...] 
with A[0] = Ao, A[l] = Ai, 8[0] = So, 8[1] = Si and so on. Defining the action of an array of diffs 
is straightforward and we note that the concept of lifting plays no part in this. 

Definition 4.1. 

s * [Ao, A]_...] = (s* A ) *A X ... 

Application is from left to right, for example (s * A ) * Ai is written s * A * Ai, and we drop the 
parentheses whenever possible from now on. It is helpful to couch the above definition in recursive 
form. 
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Two deletes 


A < 5, A J 




1 

a b c d e f g 


6 
A 

h 


A 

1 1 <5 

abcdefgh 


1- - -1 
a b c d e f g 


5- A 
A 

h 


- - -1 A-<5 
abcdefgh 


d e f g h 


(5- A)t A 


- - -1 A-S 
d e f g h 


g h 




g h 




. *A * ((5 - A) t A) = *<5 * (A - 6) 



Two deletes 


Ac 


± 5, A J > ^ 




abcdefgh 


<5 
A 


abcdefgh 


A 

S 


abcdefgh 


e 

A 


abcdefgh 


A-S 
S 


e f g h 


e 


-1 

d e f g h 


(A-S)U 


e f g h 


. *A * e — 


e f g h 

*5 * ({A - 6) t S) 





Two deletes 


A ~ <5, A_, < (5j 






S 1- - - 


A 




A 1 1 


<5 


abcdefgh 


a b c d e f g h 






S-A 


e 


abcdefgh 


A 1 1 

abcdef gh 




d e f g h 


(S - A) t A 

ef gh 


e 


e f g h 


ef gh 

•. *A * ((£ — A) t A) = *5 * e 






Two deletes 


A > S, Aj > Sj 




- - -1 


S 1 


A 




A 




abcdefgh 


abcdefgh 






S-A 1- - 


A-a 




A 1 




a b c d e f g h 


abcdefgh 




a b c h 


5 - A - -1 
a b f g h 


(A — 5) f i 


a b h 


a b h 




.-. *A * (S - A) = *S * ((A - S) f S) 





Figure 7: Four harder cases for two deletes 



Definition 4.2. 



s * A = 



_Js*A[0] |A| = 1 

s* A[0] * A[l...] |A| > 1 



We define lifting for arrays of diffs but must proceed with care. In particular, the following 
definitions and lemmas are only valid when single diffs are involved. 

Definition 4.3. Provided that <5 , A (5q), Ai(A (<5 )) and so on are all single diffs, we define, 
for \S\ = 1: 

[Ao,A 1 ...]o5 = [...A 1 (A (<Jo))] 

Note that the result is an array of diffs of length 1. Again it is helpful to couch this definition 
recursively. 

Definition 4.4. Provided that S[0] and A[0](ci[0]) are single diffs, we define for \S\ = 1 

f[A[0](J[0])] |A| = 1 

\A[l...]o[A[0](5[0])] |A|>1 



Ao<5 



Definition 4.5. If AoS is defined, we define So A for \5\ = 1 and \A\ ^ 1 to be the array of diffs 
such that the following identity holds: 

*A *Ao5 — *5*SoA 

We can derive an explicit definition but again must proceed with care and avoid all double diffs. 

Lemma 4.1. Provided that S(A[0]) and A[0](S) are single diffs, the following definition holds for 
|<J| = 1: 



So A 



l>[0](A[0])] |A| = 1 

\[<J[0](A[0])] + [A[0](«y[0])]oA[l...] |A|>1 
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Insert then delete 


A > 5 






1 

I+ + +I 
abcdefgh 


5 
A 


abcdefgh 


A 

8 


1 1 

I+ + +I 
abcdefgh 


5+ 

s- 

A 


! +++ 'i 

abcdefgh 


A 

i+ 
&- 


1- - - -1 
a bl . Ic d e f g 


5+t A 

<r 

n 


I+ + +I 
a c d e f g h 


$+ 1 


al . Ic d e f g h 


(5+ tA)tr 


I+ + +I 
a g h 




al. .Igh 




al. . .Igh 




/. *A * (<5~ 


5+ t A) = *(<5- 


;<5+) * (At S~) 





Delete then insert 


A < 5 






S 




A 




A 


I+ + +I 


5 


abcdefgh 


abcdefgh 








1- 


A+ 




A+ 


- -1 


A- 




A- 


I+ + +I 


5 


a b c d e f 




abcdefgh 




+ + + 




-1 


A+t<5 




A+t A" 


- 1- -1 


A- 


a d e f g h 




a b c 1 . Id e f g h 


+ + + 


6 t A" 




(A+ 1 6) t A- 


a e f g h 




a . . d e f g h 




al . . le f 


;h 


a . . . e f g h 






(A-;A+) * 


(5 t A") = *S *(A- 


; A+ t S) 



Two deletes 


A > 8, Aj < S_, 






5 1- - 


A 


1- - 


A 1 1 


S 


abcdef gh 


abcdefgh 




1 


5+ 


€ 




s- 1 


5+ 


1- - 


A - 


&- 


abcdefgh 


abcdefgh 






5+ -A 


€ 




s- l 1 S 






^ acdefgh 




abcdefgh 




e 


1- - -1 


(6+ - A) t A a h 






ah 




a be f gh 








(6+ - A) t A t 6~ 




a e f g h 






a h 






*A * 


(*-;(«+- A) f A) = *(«-;+)*£ 






Figure 8: The 


cases 



Two deletes 


A < 8, Aj > S_, 




1- - 


5 1 


A 

S 


abcdefgh 


A 1- -1 

a b c d e f hg 






A+ 
A- 
S 


abcdef gh 


A+ 1- 
A" 1- -1 

a b c d e f hg 


acdefgh 

a h 
a h 


6 - - - 

A+fA- |. 

- -1 

e a b c d e f hg 
1- 

a b e f g h 


A+ -8 
A- 
S 

(A+ - 8) 1 8 
A- 




1- - -1 
a e f g h 


(A+-i)t<SfA- 




a h 

(A-;(A+-5)t*) = *(A-;+)*e 



Proof. By induction on the length of A. Suppose |A| = 1, then: 

*A* AoS = *A[0] * A[0]((5[0]) = *<5[0] * 5[0](A[0]) = *8[0] * [5[0](A[0]) = *S * 5 o A 

Suppose |A| > 1, with induction hypothesis *A[1...]*A[1...](A[0](<5[0])) = *A[0](5[0])*(A[0](5[0]))o 
A[l...]. Then: 

*A * A o S = *([A[0]] + A[l...]) * A o [S[0]} 

= *([A[0]] + A[l...])*A[l...]o[A[0](<5[0])] 
= *A[0] * A[l...] * A[l...] o [A[0](<5[0])] 
= *A[0] * [A[0](<5[0])] * [A[0](5[0])] o A[l...] 
= *A[0] * A[0](J[0]) * [A[0](<5[0])] o A[l...] 
= *<y[0] * <5[0](A[0]) * [A[0]((5[0])] o A[l...] 
= #]*([i[0](A[0])] + [A[0](i[0])]oA[l...])) 
= *5 * S o A 

□ 

We now attempt to define A o 5 in the case when A[0] splits S[0]. To begin with we note that in 
the case when both diffs are deletes and a double diff is necessary, it is in fact equivalent to a single 
diff. We return to A and 6 being single diffs in what follows. 
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Lemma 4.2. If A and S are both deletes with A > 8 and Aj < 5j then there exists a <5' such that 
*8' = *(<5~; (S + — A) f A) and vice versa. 

Proof. See figure [8] □ 
The remaining cases do result in genuine double diffs but lifting them can be simplified. 
Lemma 4.3. In all cases A(S~;5 + ) = (S~; A(<5~)) and vice versa. 

Proof. See figure [8] □ 
We now attempt a definition. Suppose |A| = 1: 

Ao* = [A[0](«[0])] = [A[0P[0]- ;( 5[0] + )] = [(,5[0]-;A[0](<5[0] + ))] 
Now suppose |A| > 1: 

Ao<5 = A[l...]o[A[0](<5[0])] 

= A[l...]o[A[0](5[Q]-;tf[0]+)] 
= A[l...]o[(5[0]-;A[0](5[0]+))] 

Here we must stop, having no definition for a term of the form Ao(f; 5") where | A| > 1 and 5' , 5" 



are arbitrary diffs. Lemma 4.2 tells us that (<5~; A[0](5 + )) may be replaced by a single diff in the 
case when A[0] and S[0] are deletes but this is not the case in general. An alternative is to treat 
the lifted double diff as an array of diffs and seek a definition for terms of the form A o S where 
| A |, 1 5 1 > 1. This is possible but again only by carefully avoiding all double diffs, a situation that 
was hardly satisfactory in the first place. 

5 A formal treatment for many diffs 

Instead we take a more abstract approach. We still consider arrays of diffs A and 5 but forget 
their specific action. We take the visual approach adopted earlier and refine it, calling the resultant 
diagrams ladders. In the figure below, the ladder on the left represents the situation that we are 
presented with initially, namely that both A and S are applied to the same string s. On the other 
hand, the ideal situation is that when S is amended it can be applied not to s but to s * A. We 
denote this amended diff by A. S and represent this situation by the ladder on the right. We make 
it a rule that these two ladders, both representing true situations, equate to one another. 

S A.S 

s s * A 



A = A 
s s 



Since we are dealing with arrays of diffs we expect a rule relating to how they work and this is 
easily formulated: 

A" 

s * A' 

A' + A" = A' 

s s 

For the sake of completeness we add a rule to deal with empty diffs: 

e 

s 

A = A 

s s 

To demonstrate the utility of these ladders we derive two identities relating to empty diffs. 
Lemma 5.1. (A + e).S = A.S. 
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(A + e).<5 
= s * A s * A 

A.S 8 (A + e).<5= s * A * e e (A + e).<5 
s * A s s * (A + e) s s*A 

A A . A + e " : A := A 
s s s s s 



Figure 9: (A + e).S = A.S 

A.S 5 5 + e A.(S + e) 
s * A s s s * A 



A A - A - A 
s s s s 

Figure 10: A.(S + e) = A.S 

Proof. See figure [9] □ 
Lemma 5.2. A. (5 + e) = A.S. 

Proof. See figure [lOj □ 



We note that theorem 3.1 gives us A* A.S*5*S.A in the case when |A| = \5\ = 1. Then, using only 
the first two rules, we can derive expressions for A.S given that A and S are of arbitrary length. 
In what follows we often break down A and S into A' + A" and 8' + 8" respectively and assume 
|A'| = \5'\ = 1. 

A'.S A".A'.S = S * A 
s * A' s * A' * A" 



A.S , S A" A" A".A'.<5 

s * A s s * A' s * A' s * A 



A A A' = A' 
s s s 



Figure 11: A.S = A".A'.S when A = A' + A" and \S\ = 1 



(S'.A).S" =s*A*A.S' 
s * 5' * S'.A 



A.S 5 8' A.S' A.S' + (S'.A). 8" 

s * A s s s * A s * A 

A s = A « = A = . = A 



Figure 12: A.S = A.S' + [S'.A).S" when |A| = 1 and S = 8' + 5" 
Lemma 5.3. A.S = A". A'. 8 when A = A' + A" and \8\ = l. 

Proof. See figure [TT] □ 
Lemma 5.4. A.S = A.S' + (S'.A).S" when \A\ = 1 and 5 = 8' + S" . 

Proof. See figure [HJ □ 
Lemma |5.3| is equivalent to definition |4.4| and lemma |5.4| in the derivation of which theorem |3.1 



was used, to lemma 4.1 We note in passing that \A'.S\ and |<5'.A| may not necessarily be 1, but 
this does not affect the veracity of the proofs. 

In the case of A and 5 both being of arbitrary length, the same approach can be used. 
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fill (8'.A').8" = s * A' * A'.S' 
s * 8' s * 8' * S'.A' 



8' A'.S' 

s s * A' 

A.S 8 A" A" 

s * A s s * A' s * A' 

A A A' A' 

s s s s 

((A'.S').A").(S'.A').S" = s * A' * A" * A".A'.<5' 
s* A' * A'.S' * (A'.S').A" 

A".A'.S' 

s * A' * A" 

A" A". A'. 5' + ((A'.8').A").(8'.A').S" 

s * A' s * A 

A' = A 

s s 

Figure 13: A.S = A" .A' .5' + ((A'. 5'). A"). {S'.A'). 5" when A = A' + A" and 5 = S' + 5" 

Lemma 5.5. A.S = A" .A' .8' + {{A' .S').A").{5' .A').S" when A = A' + A" and 8 = 5' + 8". 
Proof. See figure [l3j □ 
This derivation relies on the identity A" + A" .A' .5' = A'.S' + (A'. 5'). A", which is proved next. 

A".A'.S' A'.S A" 

s * A" s s 

A" + A". A'.S' = A" = A" = A'.S 

s s s s 

(A'.S').A" 

s*A'.8' 



A'.S = A'.S' + (A'.S').A" 

s s 



Figure 14: A" + A". A'.S' = A'.S' + (A'.S').A" 
Lemma 5.6. A" + A" .A' .S' = A'.S' + (A'.S').A". 

Proof. See figure [14J □ 
We can now prove the main theorem. 

Theorem 5.1. *(A' + A") * (A' + A").(5' + 5") = *(<*' + 5") * (8' + S").(A' + A"). 

Proof. See figure [15] which shows just over the first half of the derivation. The second half is the 
same with A and 8 interchanged. □ 



6 The general case 

We have proved the identity *A * A.S — *5 * 8. A where A and 8 are arrays of diffs of arbitrary 
length. What we now prove is that if two clients have the same document initially, their documents 
will thereafter remain synchronised whenever neither have pending diffs on the server. In order to 
prove this, we make a reasonable assumption about any distributed application that employs this 
algorithm, namely that clients cannot put diffs on the server before they get the document. The 
proofs that follow are for two clients, but can easily be generalised to many. 
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(A' + A") + (A' + A").(S' + 6") _ A' + A" + A" .A' .5' + ((A' .S').A").(6L.A').S" 
s s 



((A* 5>) A") (8' A') 5" = s * A ' * A '- S ' * (A'^')-A" 
— — s * A' * A" * A". A'. 5' 



A". A'. 6' 



A" 



s * A' * A" 
s * A' 



(S'.A').S" 
5' 



A" 



A' 



s*A'*A'.5' 



s * A' 



5" 



6' 



A" 



A' 



s * 5' 



s * A' 



A" 



A' 



6" 



5' 



s * A' 



s * d' 



Figure 15: *(A' + A") * (A' + A").(<5' + 5") = *{S' + 5") * {6' + 6").{A' + A") 



Lemma 6.1. Immediately both clients have the document, they and the server have identical copies. 

Proof. Since clients cannot put diffs on the server before they get the document, immediately both 
clients have the document, only one can subsequently have put diffs on the server and hence made 
amends to it. Without loss of generality we assume client 1 gets their document first: 



client 1 



s' * 6 



server 



DOC 




client 2 



> s' * S 



Note that there are no pending diffs for client 2. The observation that the shared document is 
s' * 5 for both clients and the server completes the proof. □ 

Lemma 6.2. Suppose that two clients initially share a document with the server. If these clients 
subsequently put diffs on the server, which immediately amends them appropriately and applies 
them to its document, then should either client get their pending diffs and apply them to their own 
document, the resulting document is identical to the one held on the server. 

Proof. We use an inductive argument, the base case of which must include pending diffs for both 
clients. Up until this point it is straightforward to check that the copies of the documents remain 
identical. Without loss of generality we assume that client 1 puts their diffs on the server first: 




Since s * S * 5. A = s * A * A. 8 the base case is proved. We now set A' = 6. A, 6' — A.S and 
s' = s * S * S. A. The induction step consists of one client putting further diffs on the server, with 
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the induction hypothesis being s * 6 * A' = s' = s * A * S'. We can safely assume this is client 1, 
since the case when client 2 puts further diffs on the server would be covered by the previous step: 



client 1 

s * S 
s * S * 5' 



PUT S" 



s*S*S" *5".A'^ 



GET 



A' 
5".A'- 



server 
s' * A'.S" 



-><*' + A'.S" 



client 2 
s * A 
s * A 



GET 



s* A* (6' + A'.S") 



The following equalities make use of the identity *A*A.<5 = *S*8.A and the induction hypothesis: 



s*S*5"*S".A' 



s*6 * A' * A'.S" 

s' * A'.S" 

s* A* S' * A'.S" 



s* A* (S' + A'.S") 



The observation that these equalities include the amended documents on either client should they 
get their pending diffs as well as the amended document on the server completes the proof. □ 



7 Conclusions 

We have shown that it is possible by some jiggery-pokery to prove the identity *A*A.<5 = *5*S.A in 
all the cases when A and S are single diffs. Generalising the result to arrays of diffs proved difficult 
because of the unavoidable presence of double diffs. Specifically, an insert may split a delete. 
To work around this problem we came up with ladders, which stressed the relations between the 
various operators "*" and "+" without getting bogged in the details. It is worth noting that 
the resulting derivations do not rely on an inductive argument. In fact we can show convincingly 
using this abstract approach that an inductive argument will never work. Suppose we want to 
prove the aforementioned identity when |A| > and \S\ = 1. If we proceed by induction on the 
length of A we have the base case, and for the inductive step we set A=A' + A" where |A'| = 1. 
Our induction hypothesis is then *A" * A". 5 — *S * 8. A" . We now can only expand: 

*A * A.S = *(A' + A") * (A' + A").S 
= *A' * A" * A" .A' .S 

But here we fail, as already pointed out. As |A.<5| may be 2, so we cannot use our induction 
hypothesis. Even worse, \(A.S).A" may be nearly double the length of A". Inductive arguments 
will clearly never work. By contrast, the abstract approach does work, and suggests a branching 
algorithm that splits an array of diffs and deals with each sub-array separately. There is in fact 
no need, when splitting A into A' and A", to set |A'| = 1. Is there some optimisation to be had 
from splitting A halfway, perhaps? Finally, we conclude with the observation that it is easy to 
generalise this algorithm to more than two users by simply requiring that each user's diff gets put 
on the stack of every other's. 
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